Partial Parsing via Finite-state Cascades 1 Finite-state Cascades
نویسنده
چکیده
Finite-state cascades represent an attractive architecture for parsing unrestricted text. Deterministic parsers speciied by nite-state cascades are fast and reliable. They can be extended at modest cost to construct parse trees with nite feature structures. Finally, such deterministic parsers do not necessarily involve trading oo accuracy against speed| they may in fact be more accurate than exhaustive-search stochastic context-free parsers. Of current interest in corpus-oriented computational linguistics are techniques for bootstrapping broad-coverage parsers from text corpora. The work described here is a step along the way toward a bootstrapping scheme that involves inducing a tagger from word distributions, a lowlevel \chunk" parser from a tagged corpus, and lexical dependencies from a chunked corpus. In particular, I describe a chunk parsing technique based on what I will call a nite-state cascade. Though I shall not address the question of inducing such a parser from a corpus, the parsing technique has been implemented and is being used in a project for inducing lexical dependencies from corpora in English and German. The resulting parsers are robust and very fast. A nite-state cascade consists of a sequence of levels. Phrases at one level are built on phrases at the previous level, and there is no recursion: phrases never contain same-level or higher-level phrases. Two levels of special importance are the level of chunks and the level of simplex clauses (Abney 1990b; Abney 1990a). Chunks are the non-recursive cores of \major" phrases, i.e., NP, VP, PP, AP, AdvP. Simplex clauses are clauses in which embedded clauses have been turned into siblings|tail recursion has been replaced with iteration, so to speak. To illustrate, table 1 shows a parse tree represented as a sequence of levels. Parsing consists of a series of nite transductions, represented by the T i in (0). A number of researchers have applied nite transducers to natural-language parsing (Koskenniemi 1990; Koskenniemi et al. 1992; Roche 1993). Typically a transducer calculus is developed and syntactic analysis is accomplished by inserting syntactic
منابع مشابه
Partial parsing via finite-state cascades
Finite-state cascades represent an attractive architecture for parsing unrestricted text. Deterministic parsers specified by finite-state cascades are fast and reliable. They can be extended at modest cost to construct parse trees with finite feature structures. Finally, such deterministic parsers do not necessarily involve trading off accuracy against speed—they may in fact be more accurate th...
متن کاملEfficient Online k-Best Lookup in Weighted Finite-State Cascades
Weighted finite-state transducers (WFSTs) have proved to be powerful and efficient aids for a variety of natural-language processing tasks, including automatic phonetization and phonological rule systems (Kaplan & Kay, 1994; Laporte, 1997), morphological analysis (Geyken & Hanneforth, 2006), and shallow syntactic parsing (Roche, 1997). In particular, cascades arising from the composition of two...
متن کاملExplanation-Based Learning of Partial-Parsers
This paper presents a method for learning eecient parsers of natural language. The method consists of an Explanation-Based Learning (EBL) algorithm for learning partial-parsers, and a parsing algorithm which combines partial-parsers with existing \full-parsers". The learned partial-parsers, implementable as Cascades of Finite State Transducers (CFSTs), recognize and combine constituents eecient...
متن کاملWorkshop Notes of the ECML / MLnet Workshop on Empirical Learning of Natural Language Processing Tasks
This paper presents a method for learning eecient parsers of natural language. The method consists of an Explanation-Based Learning (EBL) algorithm for learning partial-parsers, and a parsing algorithm which combines partial-parsers with existing \full-parsers". The learned partial-parsers, implementable as Cascades of Finite State Transducers (CFSTs), recognize and combine constituents eecient...
متن کاملA Grammatical Approach to the Extraction of Index Terms∗
The extraction of the keywords that characterize each document in a given collection is one of the most important components of an Information Retrieval system. In this article, we propose to apply shallow parsing, implemented by means of cascades of finite-state transducers, to extract complex index terms based on an approximate grammar of Spanish. The effectiveness of the index terms extracte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996